Distribution and storage of data on local and remote disks in multi-use clusters of commodity PCs
نویسنده
چکیده
Over the last few decades, the power of personal computers (PCs) has grown steadily, following the exponential growth rate predicted by Moore’s law. The trend towards the commoditization of PC components (such as CPUs, memories, high-speed interconnects and disks) results in a highly attractive price/performance ratio of the systems built from those components. Following these trends, I propose to integrate the commodity IT resources of an entire company or organziation into multi-use clusters of commodity PCs. These include compute farms, experimental clusters as well as desktop PCs in offices and labs. This thesis follows a bottom-up architectural approach and deals with hardware and system-software architecture with a tight focus on performance and efficiency. In contrast, the Grid view of providing services instead of hardware for storage and computation deals mostly with problems of capability, service and security rather than performance and modelling thereof. Multi-use clusters of commodity PCs have by far enough storage on their hard-disk drives for the required local operating-system (OS) installation and therfore there is a lot of excess storage in a multi-use cluster. This additional disk space on the nodes should be put to a better use for a variety of interesting applications e.g. for on-line analytic data processing (OLAP). The specific contributions of the thesis include solutions to four important problems of optimized resource usage in multi-use-cluster environments. Analytic models of computer systems are important to understand the performance of current systems and to predict the performance of future systems early in the design stage. The thesis instroduces a simple analytic model of data streams in clusters. The model considers the topology of data streams as well as the limitations of the edges and nodes. It also takes into account the limitations of the resources within the nodes, which are passed through by the data streams. Using the model, the thesis evaluates different data-casting techniques that can be used to replicate OS installations to many nodes in clusters. The different implementations based on IP multicast, star-, treeand multi-drop–chain topologies are evaluated with the analytic model as well as with experimental measurements. As a result of the evaluation, the multi-drop chain is proposed as most suitable replication technique. When working with multi-use clusters, we noticed that maintenance of the highly replicated system software is difficult, because there are many OS installations in different versions and customisations. Since it is desirable to backup all older versions and customisations of all OS installations, I implemented several techniques to archive the large amounts of highly redundant data contained in the nodes’ OS partitions. The techniques take different approaches of comparing the data, but are all OS independent and work with whole partition images. The block repositories
منابع مشابه
On the Design and Performance of Remote Disk Drivers for Clusters of PCs
This paper presents the design and performance of remote disk drivers for clusters of Commodity-Off-The-Shelf PCs that fetch disk blocks over System Area Networks. The driver offers a flexible interface, being capable to logically act either as computeror network-attached storage. It allows for fine-grain remote cache control through exclusive caching. An event-driven asynchronous block deliver...
متن کاملOS Support for a Commodity Database on PC clusters - Distributed Devices vs. Distributed File Systems
In this paper we attempt to parallelise a commodity database for OLAP on a cluster of commodity PCs by using a distributed high-performance storage subsystem. By parallelising the underlying storage architecture we eliminate the need to make any changes to the database software. We look at two options that differ in their complexity and features: Distributed devices and distributed file systems...
متن کاملFree Factories: Unified Infrastructure for Data Intensive Web Services
We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and ...
متن کاملProviding Single I/O Space and Multiple Fault Tolerance in a Distributed RAID
Commodity EIDE disks provide low cost storage but are severely limited in bandwidth and cannot be made fault-tolerant. On the other hand, conventional RAID devices provide reliability and performance but worse price/performance figures. A cluster of PCs can be seen as a collection of networked low cost disks; such a collection can be operated by proper software so as to provide the abstraction ...
متن کاملPerformance Evaluation of Local Detectors in the Presence of Noise for Multi-Sensor Remote Sensing Image Matching
Automatic, efficient, accurate, and stable image matching is one of the most critical issues in remote sensing, photogrammetry, and machine vision. In recent decades, various algorithms have been proposed based on the feature-based framework, which concentrates on detecting and describing local features. Understanding the characteristics of different matching algorithms in various applications ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003